Interactive map for AtlantECO project

Python
Author

Kate S [Ekaterina Sakharova] (MGnify team)

This is a static preview

You can run and edit these examples interactively on Galaxy

Mapping samples from the AtlantECO Super Study

… using the MGnify API and an interactive map widget

The MGnify API returns JSON data. The jsonapi_client package can help you load this data into Python, e.g. into a Pandas dataframe.

This example shows you how to load a MGnify Super Study’s data from the MGnify API and display it on an interactive world map

You can find all of the other “API endpoints” using the Browsable API interface in your web browser. The URL you see in the browsable API is exactly the same as the one you can use in this code.

This is an interactive code notebook (a Jupyter Notebook). To run this code, click into each cell and press the ▶ button in the top toolbar, or press shift+enter.


Fetch all AtlantECO studies

A Super Study is a collection of MGnify Studies originating from a major project. AtlantECO is one such project, aiming to develop and apply a novel, unifying framework that provides knowledge-based resources for a better understanding and management of the Atlantic Ocean and its ecosystem services.

Fetch the Super Study’s Studies from the MGnify API, into a Pandas dataframe:

import pandas as pd
from jsonapi_client import Session, Modifier

atlanteco_endpoint = 'super-studies/atlanteco/flagship-studies'
with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    studies = map(lambda r: r.json, mgnify.iterate(atlanteco_endpoint))
    studies = pd.json_normalize(studies)
studies[:5]

Show the studies’ samples on a map

We can fetch the Samples for each Study, and concatenate them all into one Dataframe. Each sample has geolocation data in its attributes - this is what we need to build a map.

It takes time to fetch data for all samples, so let’s show samples from the first 6 studies only.

studies_samples = []

with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    for idx, study in studies[:6].iterrows():
        print(f"fetching {study.id} samples")
        samples = map(lambda r: r.json, mgnify.iterate(f'studies/{study.id}/samples?page_size=1000'))
        samples = pd.json_normalize(samples)
        samples = pd.DataFrame(data={
            'accession': samples['id'],
            'sample_id': samples['id'],
            'study': study.id, 
            'lon': samples['attributes.longitude'],
            'lat': samples['attributes.latitude'],
            'color': "#FF0000",
        })
        samples.set_index('accession', inplace=True)
        studies_samples.append(samples)
studies_samples = pd.concat(studies_samples)
print(f"fetched {len(studies_samples)} samples")

studies_samples.head()
import leafmap
m = leafmap.Map(center=(0, 0), zoom=2)
m.add_points_from_xy(
    studies_samples,
    x='lon', 
    y='lat', 
    popup=["study", "sample_id"], 
    color_column='color',
    add_legend=False
)
m

Check GO term presence

Let’s check whether a specific identifier is present in each sample. This example is written for GO-term ‘GO:0015878’, but other identifier types are available on the MGnify API.

We will work with MGnify analyses (MGYAs) corresponding to chosen samples. We filter analyses by - pipeline version: 5.0 - experiment type: assembly

This example shows how to process just the first 10 samples (again, because the full dataset takes a while to fetch). Firstly, get analyses for each sample.

analyses = []
with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    for idx, sample in studies_samples[:10].iterrows():
        print(f"processing {sample.sample_id}")
        filtering = Modifier(f"pipeline_version=5.0&sample_accession={sample.sample_id}&experiment_type=assembly")
        analysis = map(lambda r: r.json, mgnify.iterate('analyses', filter=filtering))
        analysis = pd.json_normalize(analysis)
        analyses.append(analysis)
analyses = pd.concat(analyses)
analyses[:5]

Next, check each analysis for GO term presence/absence. We add a column to the dataframe with a colour: blue if GO term was found and red if not.

identifier = "go-terms"
go_term = 'GO:0015878'
go_data = []
with Session("https://www.ebi.ac.uk/metagenomics/api/v1") as mgnify:
    for idx, mgya in analyses.iterrows():
        print(f"processing {mgya.id}")
        analysis_identifier = map(lambda r: r.json, mgnify.iterate(f'analyses/{mgya.id}/{identifier}'))
        analysis_identifier = pd.json_normalize(analysis_identifier)
        go_data.append("#0000FF" if go_term in list(analysis_identifier.id) else "#FF0000")
analyses.insert(2, identifier, go_data, True)

Join the analyses and sample tables to have geolocation data and identifier presence data together.

We’ll create a new sub-DataFrame with a subset of the fields and add them to the map.

df = analyses.join(studies_samples.set_index('sample_id'), on='relationships.sample.data.id')
df2 = df[[identifier, 'lon', 'lat', 'study', 'attributes.accession', 'relationships.study.data.id', 'relationships.sample.data.id', 'relationships.assembly.data.id']].copy()
df2 = df2.set_index("study")
df2 = df2.rename(columns={"attributes.accession": "analysis_ID", 
                          'relationships.study.data.id': "study_ID",
                          'relationships.sample.data.id': "sample_ID", 
                          'relationships.assembly.data.id': "assembly_ID"
                         })
m = leafmap.Map(center=(0, 0), zoom=2)
m.add_points_from_xy(df2, 
                     x='lon', 
                     y='lat', 
                     popup=["study_ID", "sample_ID", "assembly_ID", "analysis_ID"],
                    color_column=identifier, add_legend=False)
m